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In this paper, we propose a spreading activation approach for collaborative filtering (SA- 
CF). By using the opinion spreading process, the similarity between any users can be 
obtained. The algorithm has remarkably higher accuracy than the standard collaborative 
filtering (CF) using Pearson correlation. Furthermore, we introduce a free parameter /3 
to regulate the contributions of objects to user-user correlations. The numerical results 
indicate that decreasing the influence of popular objects can further improve the algorith- 
mic accuracy and personality. We argue that a better algorithm should simultaneously 
require less computation and generate higher accuracy. Accordingly, we further propose 
an algorithm involving only the top-A r similar neighbors for each target user, which has 
both less computational complexity and higher algorithmic accuracy. 
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1. Introduction 

With the advent of the Internet, the exponential growth of the World- Wide- Web and 
routers confront people with an information overload^. We are facing too much data 
to be able to effectively filter out the pieces of information that are most appropriate 
for us. A promising way is to provide personal recommendations to filter out the 
information. Recommendation systems use the opinions of users to help them more 
effectively identify content of interest from a potentially overwhelming set of choices 

2. Motivated by the practical significance to the e-commerce and society, var ious 
kinds of algorithms ha ve b een proposed, such as c orrelation-based methods 
content-based methods the spectral analysis principle component analysis 
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network-based methods | 10 | H | l^ .Uiy an d so on. For a review of current progress, 
see Ref. ^31 anc [ the references therein. 

One of the most successful technologies for recommendation systems, called col- 
laborative fil tering (CF) , has been developed and extensively investigated over the 
past decade l ^ | 4 | 15 |. When predicting the potential interests of a given user, such 
approach first identifies a set of similar users from the past records and then makes 
a prediction based on the weighted combination of those similar users' opinions. 
Despite its wide applications, collaborative filtering suffers from several major lim- 
itations including system scalabi lity an d accuracy R ece ntly, some physical dy- 
namics, including mass diffusion l -^U-U ne at conduction 013 and trust-based model 
have found their applications in personal recommendations. These physical ap- 
proaches have be en demon strated to be of both high accuracy and low computa- 
tional complexity | 10 | i^i | llj However, the algorithmic accuracy and computational 
complexity may be very sensitive to the statistics of data sets. For example, the 
algorithm presented in Ref. runs much faster than standard CF if the number 
of users is much larger than that of objects, while when the number of objects is 
huge, the advantage of this algorithm vanishes because its complexity is mainly de- 
termined by the number of objects (see Ref. for details). In order to increase the 
system scalability and accuracy of standard CF, we introduce a network-based rec- 
ommendation algorithm with spreading activation, namely SA-CF. In addition, two 
free parameters, [3 and N are presented to increase the accuracy and personality. 



2. Method 

Denoting the object set as O — {oi, o 2 , • • • , o„} and user set as U = {m, U2, • ■ ■ , 
u m }, a recommendation system can be fully described by an adjacent matrix A — 
{aij} <E R n,m , where = 1 if a; is collected by Uj, and ay = otherwise. For 
a given user, a recommendation algorithm generates a ranking of all the objects 
he/she has not collected before. 

Based on the user-object matrix A, a user similarity network can be constructed, 
where each node represents a user, and two users are connected if and only if 
they have collected at least one common object. In the standard CF, the similarity 
between Ui and Uj can be evaluated directly by a correlation function: 

En 

ij minikin) ^(uj)}' 1 ' 

where fc(u^) = X)"=i a n ' IS ^ ne degree of user Ui. Inspired by the diffusion process 
presented by Zhou et al. we assume a certain amount of resource (e.g. recom- 
mendation power) is associated with each user, and the weight Sy represents the 
proportion of the resource, which uj would like to distribute to Ui. Following a 
network-based resource- allocation process where each user distributes his/her 
initial resource equally to all the objects he/she has collected, and then each object 
sends back what it has received to all the users collected it, the weight (the 
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fraction of initial resource Uj eventually gives to Uj) can be expressed as: 

s « - k{uj) L, k{oi) ' W 

where fc(o/) = Y2i=i a u denotes the degree object o;. Using the spreading process, 
the user correlation network can be constructed, whose edge weight is obtained by 
Eq. p]). For the user-object pair (ui, o,-), if itj has not yet collected Oj (i.e. a™ = 0), 
the predicted score, Wy , is given as 

Em 
1=1, l^i s U a jl , . 

% — „ m ■ (Oj 

2-il=l,l±i S 'i 

From the definition of Eq.Q, one can get that, to a target user, all of his neighbors' 
collection information would affect the recommendation results, which is different 
with the definition reachability ^Ql. Based on the definitions of sy and wy, SA-CF 
can be given. The framework of the algorithm is organized as follows: (I) Calculate 
the user similarity matrix {sij} based on the spreading approach; (II) For each 
user i, obtain the score t?y on every object not being yet collected by j; (III) Sort 
the uncollected objects in descending order of wy, and those in the top will be 
recommended. 



3. Numerical results 

To test the algorithmic accuracy and personality, we use a benchmark data-set, 
namely MovieLens The data consists of 1682 movies (objects) and 943 users, 
who vote movies using discrete r atings 1-5. Hence we applied the coarse-graining 

11 911 *3I 

method previously used in Refs. 1 1 ° \ A movie is set to be collected by a user 
only if the giving rating is larger than 2. The original data contains 10 5 ratings, 
85.25% of which are > 3, thus the user-object (user-movie) bipartite network after 
the coarse gaining contains 85250 edges. To test the recommendation algorithms, 
the data set is randomly divided into two parts: the training set contains 90% of 
the data, and the remaining 10% of data constitutes the probe. The training set is 
treated as known information, while no information in the probe set is allowed to 
be used for prediction. 

A recommendation algorithm should provide each user with an ordered queue of 
all its uncollected objects. It should be emphasized that, the length of queue should 
not be given artificially, because of the fact that the number of uncollected movies 
for different users are different. For an arbitrary user Uj, if the relation itj-cy is in 
the probe set (according to the training set, Oj is an uncollected object for Uj), we 
measure the position of Oj in the ordered queue. For example, if there are Li = 100 
uncollected movies for m, and Oj is the 10th from the top, we say the position of 
oj is 10/ Li, denoted by ry =0.1. Since the probe entries are actually collected 
by users, a good algorithm is expected to give high recommendations to them, 
thus leading to small ry. Therefore, the mean value of the position ry, (r) (called 
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ranking score D^J), averaged over all the entries in the probe, can be used to evaluate 
the algorithmic accuracy: the smaller the ranking score, the higher the algorithmic 
accuracy, and vice verse. Implementing the SA-CF and CF 1=^. the average value 
of ranking score are 0.12187 ± 0.02406 and 0.13069 ± 0.0571 22 . Clearly, under 
the simplest initial configuration, subject to the algorithmic accuracy, the SA-CF 
algorithm outperforms the standard CF. 



4. Two modified algorithms 

In order to further improve the algorithmic accuracy, we propose two modified 
methods. Similar to the Ref. 031, taking into account the potential role of object 
degree may give better performance. Accordingly, instead of Eq. (2), we introduce 
a more complicated way to get user-user correlation: 

where is a tunable parameter. When = 1, this method degenerates to the algo- 
rithm mentioned in the last section. The case with j3 > 1 weakens the contribution 
of large-degree objects to the user-user correlation, while j3 < 1 will enhance the 




Fig. 1. (Color online) (r) vs. j3. The black solid and red dash curves represent the performances 
of SA-CF and CF, respectively. All the data points are obtained by averaging over ten independent 
runs with different data-set divisions. 
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contribution of large-degree objects. According to our daily experience, if two users 
Ui and Uj has simultaneously collected a very popular object (with very large de- 
gree), it doesn't mean that their interests are similar; on the contrary, if two users 
both collected an unpopular object (with very small degree), it is very likely that 
they share some common and particular tastes. Therefore, we expect a larger (3 (i.e. 
j3 > 1) will lead to higher accuracy than the routine case (3 = 1. 

Fig[T] reports the algorithmic accuracy as a function of (3. The curve has a clear 
minimum around (3 = 1.9, which strongly support the above statement. Compared 
with the routine case {(3 = 1), the ranking score can be further reduced by 11.2% at 
the optimal value. It is indeed a great improvement for recommendation algorithms. 
Besides accuracy, the average deg ree of all recommended movies (k) and the mean 
value of Hamming distance S are taken in account to measure the algorithmic 
personality. The movies with higher degrees are more popular than the ones with 
smaller degrees. The personal recommendation should give small (fc) to fit the spe- 
cial tastes of different users. Fig|2] reports the average degree of all recommended 
movies as a function of (3. One can see from figE] that the average degree is nega- 
tively correlated with f3, thus depressing the recommendation power of high-degree 
objects gives more opportunity to the unpopular objects. The Hamming distance, 




Fig. 2. (Color online) The average degree of all recommended movies (k) vs. /3. The black solid, 
red dashed and green dotted curves represent the cases with typical length L = 10, 20 and 50, 
respectively. The blue dot line corresponds to the optimal value /3 op t = 0.19. All the data points 
are obtained by averaging over ten independent runs with different data-set divisions. 
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S = (Hij), is defined by the mean value among any two recommended lists of Ui 
and Uj , where Hj — 1 — Q/L, L is the list length and Q is the overlapped number of 
objects in the two users' recommended lists. Figj3] shows the positively correlation 
between S and 0, in according with the simulation results in fig 12 which indicates 
that depressing the influence of high-degree objects makes the recommendations 
more personal. The above simulation results indicate that SA-CF outperforms CF 
from the viewpoints of accuracy and personality. 

Besides the algorithmic accuracy and personality, the computational complex- 
ity should also be taken into account. Actually, we argue that a better algorithm 
should simultaneously require less computation and generate higher accuracy. Note 
that, the computational complexity of Eq. (3) is very high if the number of user, m, 
is huge. Actually, the majority of user-user similarities are very small, which con- 
tribute little to the final recommendation. However, those inconsequential items, 
corresponding to the less similar users, dominate the computational time of Eq. 
(3). Therefore, we propose a modified algorithm, so-called top- AT SA-CF, which 
only considers the N most similar users' information to any given user. That is to 
say, in the top-A SA-CF, the sum in Eq. (3) runs over only the N most similar 
users of Ui. In the process of calculation the similarity matrix Sij, to each other, 
we can simultaneously record its most similar users. When m ^> A, the additional 




Fig. 3. (Color online) S vs. /3. The black solid, red dashed and green dotted curves represent the 
cases with typical lengths L = 10, 20 and 50, respectively. The blue dot line corresponds to the 
optimal value /3 op t = 0-19. All the data points are obtained by averaging over ten independent 
runs with different data-set divisions. 
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computing time for top-TV similar users are remarkably shorter than what we can 
save from the traditional calculation of Eq. (3). More surprisingly, as shown in Fig|4l 
with properly chosen TV, this algorithm not only reduces the computation, but also 
enhances the algorithmic accuracy. This property is of practical significance, espe- 
cially for the huge-size recommender systems. From figures [2] and [3l one can find 
that, to the same (3 range, the anticorrelations between (k), S and (3 are different in 
different (3 range. Maybe there is a phase transition in the anticorrelations. Because 
this paper mainly focuses on the accuracy and personality of the recommendation 
algorithms, this issue would be investigated in the future. 

5. Conclusions 

In this paper, the spreading activation approach is presented to compute the user 
similarity of the collaborative filtering algorithm, named SA-CF. The basic SA- 
CF has obviously higher accuracy than the standard CF. Ignoring the degree- 
degree correlation in user-object relations, the algorithmic complexity of SA-CF 
is 0(m{k u )(k ) +mn{k )), where {k u ) and (k ) denote the average degree of users 
and objects. Correspondingly, the algorithmic complexity of the standard CF is 
0(m 2 (k u ) + mn(k )) 1 where the first term accounts for the calculation of similarity 
between users, and the second term accounts for the calculation of the predictions. 
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Fig. 4. (r) vs. N. The inset shows the relation for larger TV. Clearly, when N approaches n, the 
algorithmic accuracy is the same as that of the SA-CF with (r) = 0.12187 ± 0.02406. All the data 
points are obtained by averaging over ten independent runs with different data-set divisions. 



14, 2009 15:39 WSPC/INSTRUCTION FILE ijmpc 



8 Jian-Guo Liu et. al. 

In reality, the number of users, to, is much larger than the average object degree, 
(k a ), therefore, the computational complexity of SA-CF is much less than that of 
the standard CF. The SA-CF has great potential significance in practice. 

Furthermore, we proposed two modified algorithms based on SA-CF. The first 
algorithm weakens the contribution of large-degree objects to user-user correlations, 
and the second one eliminates the influence of less similar users. Both the two 
modified algorithm can further enhance the accuracy of SA-CF. More significantly, 
with properly choice of the parameter TV, top-iV SA-CF can simultaneously reduces 
the computational complexity and improves the algorithmic accuracy. 

A natural question on the presented algorithms is whether these algorithms are 
robust to other data-sets or random recommendation? To SA-CF, the answer is 
yes, because it would get the user similarity more accurately. While to the two 
modified algorithms, the answer is no. Since both of the two modified algorithms 
introduced the tunable parameters j3 and N, the optimal values of different data-sets 
are different. The further work would focus on how to find an effective way to obtain 
the optimal value exactly, then the modified algorithms could be implemented more 
easily. 
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